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INTRODUCTION 


Symbolic  computation  differs  from  its  conventional 
numeric  cousin  in  several  fundamental  ways.  Foremost  among 
the  differences  is  the  set  of  applications  each  intends  to 
address.  Typical  symbolic  computing  applications  include 
logical  inference,  information  extraction,  problem  solving, 
and  text/speech/ image  understanding.  These  applications 
typically  require  the  processing  of  large  amounts  of 
information.  In  addition,  many  symbolic  computing 
application  environments  are  interactive  and  characterized 
by  real-time  performance  requirements. 

Using  conventional  serial  hardware,  however,  the  time 
it  takes  to  process  large  amounts  of  symbolic  information 
precludes  real-time  applications  because  the  minimum  time 
interval  is  fixed  by  practical  considerations.  For 
instance,  multiplying  two  matrices  takes  0(N3)  steps,  where 
NxN  is  the  number  of  elements  in  each  matrix,  and  serial 
sorting  can  be  done  in  no  less  than  O(NlogN)  time  steps, 
where  N  is  the  length  of  the  list.  In  addition,  serial 
operations  can  only  be  pipelined  to  a  limited  degree: 
usually  a  memory  fetch  can  occur  while  the  cpu  is  processing 
another  piece  of  data,  but  only  one  processing  step  can  be 
performed  at  a  time.  Fortunately  many  symbolic  computing 
operations  have  parallel  algorithms  that  are  pipelinable, 
and  thus  may  run  faster  on  parallel  machines. 

Symbolic  computing  applications  that  are  parallelizable 
include  calculating  the  transitive  closure,  shortest  path  or 
connected  components  of  a  relational  graph.  In  addition, 
pruning  a  graph  by  consistent  labeling  with  parallel  matrix 
operations  may  reduce  subsequent  graph  search  times. 
Parallel  algorithms  for  image  and  signal  recognition  include 
filtering  and  large  kernal  convolution  and  correlation.  The 
logical  set  and  relational  algebra  operations  like 
intersection,  union,  division,  projection,  join,  and 
cartesian  product  also  can  be  speeded-up  by  parallel 
algorithms.  In  the  rest  of  this  report  we  will  focus  on 
sorting,  which  is  common  to  both  symbolic  and  conventional 
computation.  We  will  begin  by  reviewing  the  existing 
parallel  sorting  algorithms.  Parallel  algorithms  for  the 
other  symbolic  computing  operations  will  be  the  subject  of 
future  research. 

Because  of  the  real-time  and  large  N  constraints  of 
symbolic  computation,  we  will  confine  our  discussions  to 
parallel  algorithms  where  the  sorting  time  grows  sublinearly 
with  N:  this  immediatly  excludes  linear  arrays.  Usually  the 
uniform  cost  criterion  is  assumed  when  comparing  algorithms, 
where  all  steps  of  the  computation  are  of  the  same  duration 
and  processing,  interconnect,  and  memory  elements  and 
operations  are  equally  costly.  Of  course  the  uniform  cost 
criterion  is  not  applicable  to  systems  with  varying  hardware 
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characteristics.  In  addition,  all  relative  order  arguments 
only  apply  for  asymptotic  values  of  N  where  constants  and 
lower  order  terms  are  ignored.2  However,  N  is  bounded  by 
technology  constraints  so  the  magnitude  of  the  constants  can 
be  important  when  comparing  real  systems.  Clearly 
technology  and  architecture  dependent  constants  and  relative 
costs  are  critical  in  a  meaningful  trade-off  analysis 
between  sorting  systems.  The  following  analysis  will  focus 
on  these  subtleties  and  result  in  the  specification  of 
optimal  sorting  systems  with  regard  to  the  requirements  of 
symbolic  computing  applications. 

Sorting  can  be  performed  on  2-D  array  of  processing 
elements  in  sublinear  0(N1//2)  time.3  The  nearest-neighbor 
communication  of  meshes  allows  the  minmum  temporal  increment 
to  be  extremely  small.  Moreover,  the  2-D  topology  makes 
them  particularly  well  suited  for  implementation  with  2-D 
technologies  like  electronics.  While  simple  matrix 
operations  can  be  pipelined  for  high  throughput,  most 
complex  mesh  algorithms  like  sorting  are  not  pipelinable; 
and  therefore,  the  system  throughput  equals  the  latency.  In 
the  high  performance  sorting  applications  found  in  symbolic 
computing,  achieving  a  modest  sublinear  temporal  complexity 
without  pipelining  is  inadequate.  Hence  we  must  consider 
alternative  sorting  architectures. 

The  most  powerful  class  of  parallel  algorithms  are 
based  on  reconf igurable  global  communications  between 
parallel  processing  elements  and  a  common  memory;  hence  they 
are  called  shared  memory  machines.  The  processing  elements 
are  fully  connected  through  the  memory  and  allow  varying 
degrees  of  simultaneous  memory  reads  and  writes.  Of  the 
three  general  classes  of  parallel  algorithms,  shared  memory 
computer  algorithms  can  perform  operations  with  the  lowest 
number  of  time  steps.  For  instance,  sorting  can  be 
performed  in  O(logN)  time  and  many  graph  problems  benefit 
from  shared  memory.  Abstract  shared  memory  machines  can 
also  simulate  both  the  mesh  and  network  computers  with  no 
time  delay. 

While  the  temporal  complexity  of  shared  memory 
algorithms  may  be  low,  they  usually  require  significantly 
more  spatial  resources  than  the  mesh  algorithms.  In 
addition  to  limiting  the  size  of  the  shared  memory 
implementations,  there  is  also  a  breakdown  of  the  uniform 
cost  criterion  when  comparing  shared  memory  machines  with 
meshes  due  to  the  globally  reconf igurable  interconnect. 
Global  reconfiguration  takes  much  more  time  in  real  systems 
than  a  simple  nearest  neighbor  communication:  the 
reconfiguration  time  increases  substantially  with  N,  the 
number  of  processing  elements  in  the  architecture.  Thus 
shared  memory  implementations  will  be  limited  by  practical 


considerations  to  small-scale  parallel  architectures,  which 
will  not  be  optimal  for  sorting  applications  in  symbolic 
computing . 

Fortunately  there  is  an  alternative  to  the  mesh  and 
shared  memory  approaches  to  parallel  computer  architecture 
which  we  call  network  architectures.  Network  architectures 
are  characterized  by  fixed,  global  communications  between 
simple  parallel  processing  elements.  Sorting  algorithms 
exist  for  network  architectures  that  can  be  pipelined,  have 
low  delay  0(log2N),  and  moderate  spatial  complexity 
0(Nlog2N).  However  in  electronic  network  implementations, 
the  global  communications  limit  the  maximum  N  and  the 
minimum  temporal  interval  from  above  and  below  respectively. 
Optical  network  implementations  on  the  other  hand  are 
capable  of  building  very  large  networks  where  the  minimum 
temporal  interval  is  limited  by  the  propagation  distance  and 
the  speed  of  light.  At  one  nanosecond  per  foot,  connection 
occurs  quite  rapidly  in  either  fiber  optic,  bulk  optical, 
or  holographic7  systems.  Thus  the  minimum  temporal 
increment  and  N  of  optical  networks  can  approach  that  of 
electronic  meshes.  The  network  architectures  also  obtain  a 
low  temporal  complexity  at  the  cost  of  reasonable  spatial 
complexity  while  retaining  the  ability  to  pipeline  the 
sorting  operation. 

From  the  preceding  discussion  it  appears  that  network 
sorting  algorithms  are  optimal  for  symbolic  computing 
applications.  In  addition,  optical  networks  are  favored 
over  their  electronic  counterparts  because  of  their  large 
size  and  bandwidth.  The  remainder  of  this  report  is  devoted 
to  issues  concerning  the  optical  implementation  of  network 
sorting  algorithms.  The  first  section  details  the  design  of 
optical  implementations  of  the  active  portion  of  the  network 
sorting  algorithms,  the  compare-and-exchange  operation.  We 
propose  using  a  distinctive  feature  of  optical  devices, 
namely  bistablity,  that  enables  the  construction  of  simple, 
hardwired  circuits.  At  the  end  of  this  section  we  show  how 
the  properties  of  optical  device  families  can  be  used  to 
project  the  application  domains  of  the  resulting  sorting 
networks.  This  paper  has  been  accepted  to  a  special  issue 
of  Applied  Optics  on  optical  computing. 

In  the  second  and  last  section  we  demonstrate  an  all- 
optical  implementation  of  the  compare-and-exchange  operation 
using  ZnS  interference  filters.  This  was  a  collaborative 
effort  between  BDM  and  the  Optical  Circuitry  Cooperative  of 
the  Optical  Sciences  Center  at  the  University  of  Arizona. 
This  paper  has  been  accepted  to  a  special  issue  of  the  IEEE 
Journal  of  Selected  Areas  in  Communications  on  photonic 
switching. 
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ABSTRACT 

Sorting  is  central  to  the  solution  of  many  knowledge-based  and 
switching  problems  in  advanced  computation  and  communication  systems. 
Parallel-pipelined  sorting  algorithms  are  appropriate  for  applications 
that  demand  high  throughput.  low  delay  and  many  data  channels.  One 
such  algorithm,  the  bitonic  sort,  can  be  implemented  with  passive 
perfect  shuffle  interconnects  between  active  stages  of  compare-and- 
exchange  elements.  In  this  paper  we  focus  on  optical  hardware  to 
implement  the  C&E  operation  and  show  that  by  taking  advantage  of  a 
distinctive  feature  of  optical  logic,  namely  bistability,  comparison 
circuits  of  remarkable  simplicity  are  attainable.  We  describe 
implementations  of  C&E  in  a  variety  of  optical  device  technologies 
capable  of  performing  latching  and  nonlatching  logic.  Based  on  the 
device  characteristics  we  outline  potential  application  areas  for  each 


technology . 


INTRODUCTION 


In  the  early  seventies  it  was  estimated  that  25%  of  ail  computer 
time  was  devoted  to  sorting.1  With  the  widespread  application  of 
dedicated  micro-controllers  it  is  unlikely  that  this  is  still  true: 
however,  sorting  remains  one  of  the  most  common  tasks  in  general- 
purpose  computation.  For  instance,  databases  and  expert  systems  often 
sort  the  elements  of  a  data  structure  to  simplify  searching  and  the 
addition  of  new  elements.  Furthermore,  data  manipulation  operations 
like  projection.  set  union  and  intersection  can  be  directly 
implemented  by  modified  sorting  algorithms. 2  Typically,  knowledge- 
based  systems  operate  on  large  numbers  of  related  elements  of 
information.  As  a  general  rule  the  number  of  parallel  steps  necessary 
to  sort  a  data  structure  depends  on  the  number  of  elements  and  the 
faster  a  sorting  algorithm  is,  the  more  resources  it  requires.  In 
other  words  the  temporal  complexity  of  the  sorting  problem  is  reduced 
at  the  expense  of  increased  spatial  complexity.  Thus  faced  with  a 
maximum  amount  of  spatial  resources  and  a  minimum  switching  delay 
allowed  by  device-physics  considerations,  the  time  it  takes  to  sort 
large  structures  in  knowledge-based  systems  can  prohibit  real-time 
application. 

In  addition  to  its  widespread  use  in  computation,  sorting  is  also 
important  in  communications.3  In  particular,  parallel  processor 
architectures  can  be  interconnected  by  pipelined  sorting  networks 


serving  as  message  passing  systems.*  Similarly.  telecommunication 
packet  switches  can  be  based  on  sorting  networks . 5  Just  as  in 
knowledge -based  systems,  the  hardware  sorters  for  massively  parallel 
architectures  and  subscriber  loop  communications  must  process  large 
numbers  of  parallel  channels  with  low  delay.  More  importantly 
however,  the  sorting  networks  in  communications  must  keep  up  with  the 
data  and  packet  generation  rates--which  can  be  considerable  in  large¬ 
grained  parallel  architectures  and  in  trunk  and  video 
telecommunications.  Hence,  demands  on  the  throughput  of  the  sorting 
hardware  mandate  the  use  of  parallel,  pipelined  sorting  algorithms. 


A  sorting  algorithm  that  fulfills  the  combined  requirements  of 
low  temporal  and  spatial  complexity  along  with  high  throughput  is  the 


bitonic  network.3  The  bitonic  network  can  be  pipelined  in  a 
multistage  architecture  that  requires  global  interconnects  and  active 
compare-and-exchange  ( C&E )  modules.  It  has  been  recognized  that 
optical  interconnects  can  provide  the  global  connections  needed 
between  the  stages.6.  3  In  addition,  previous  research  described 
optical  implementations  of  the  exchange  portion  of  the  active  modules 
using  polarization  switches,9  directional  couplers10  and  hybrid 
optoelectronic  circuits.11  In  this  paper  we  show  the  feasibility  of 
simple,  hardwired  implementations  of  C&E  in  which  all  the  processing 
is  performed  either  optically  or  electrooptically .  Specifically  we 
propose  using  a  family  of  devices  for  the  comparison  operation  that 
employs  bistability  to  combine  logic  and  memory  in  a  single  device-- 
obviating  the  need  for  external  feedback  as  in  flip-flops.  In  the 
next  section  we  review  the  properties  of  the  bitonic  algorithm  and  its 
implications  to  electronic  and  optical  implementations.  In  the  last 
section  we  describe  implementations  of  C&E  using  all-optical,  hybrid 
optoelectronic  and  electrooptic  logic  devices.  Based  on  the  device 
characteristics,  we  outline  the  properties  of  the  ensuing  networks  and 
their  potential  application  domains. 
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PIPELINED  BITONIC  SORTING  NETWORKS 


A  bitonic  sequence  of  length  N  is  composed  of  two  sorted 
subsequences  of  length  N/2.  one  monotonically  increasing,  the  other 
decreasing.  The  bitonic  merge  combines  the  subsequences  into  a  sorted 
sequence  of  length  N  using  logN  stages,  each  stage  composed  of 
compare-and-exchange  modules  and  fixed  interconnections  between 
stages.  The  bitonic  sorting  algorithm  for  an  arbitrarily  ordered 
input  list  uses  a  divide-and-conquer  strategy  which  begins  by  applying 
the  bitonic  merge  to  bitonic  sequences  of  length  2,  generating  sorted 
sequences  of  length  2  and  bitonic  sequences  of  length  4.  By  repeated 
application  of  the  bitonic  merge,  sorbed  and  bitonic  sequences  of 
twice  the  length  of  those  in  the  previous  stage  are  formed.  Thus,  it 
takes  logN  applications  of  the  bitonic  merge  to  produce  a  sorted 
sequence  of  length  N.  Since  the  kth  merge  takes  log  k  steps,  the  time 
complexity  of  the  bitonic  sort  is  O(logN)*.  If  we  have  a  pipelined 
bitonic  sorting  network  then  there  are  N/2  compare-and-exchange 
modules  per  stage:  thus,  the  spatial  complexity  is  0(N(logN)2j. 


modules  per  stage: 


Pipelined  bitonic  networks  are  difficult  to  implement  with 
electronic  technology  for  several  reasons.  Because  of  the  length  and 
complexity  of  the  interstage  interconnects,  2-D  layouts  of  the 
networks  require  area  and  communication  distances  that  grow  faster 
than  N,  the  number  of  data  channels.  12  In  such  wire-limited 
architectures  the  signal  propagation  delays  are  dependent  on  the 
number  of  channels  and  contribute  to  the  overall  sorting  delay. 
Similarly,  the  long  wires  require  large,  high-power  drivers  that 
dominate  the  total  system  power  for  large  values  of  N.  Moreover  the 
maximum  signal  bandwidth,  and  thus  the  throughput,  is  proportional  to 
the  difference  in  length  of  the  wires  in  a  stage  or  their  RC  time 
constants,  whichever  is  larger.  With  the  addition  of  high-speed 
buffers  the  time-skew  limited  throughput  can  be  increased  at  the 
expense  of  increased  delay.  In  conclusion,  pipelined  bitonic  sorting 
networks  in  electronics  are  limited  to  applications  with  small  numbers 
of  data  channels  and  low  signal  bandwidths. 

Optical  technology,  on  the  other  hand,  is  well-suited  to 
implement  the  interconnects  needed  in  sorting  networks . * .  7 .  8  Each 
bitonic  interstage  connection  pattern  can  be  emulated  by  a  number  of 
perfect  shuffles  with  global,  space-variant  communications.13  Free- 
space  optical  implementations  of  the  perfect  shuffle  have  been 
demonstrated1 1 5  that  exploit  the  third  dimension  for  non-interacting 
communication  channels.  Since  the  active  devices  must  share  area  only 


with  the  connections’  input  and  ouput  transducers  rather  than  the 


connections  themselves.  sorters  with  large  numbers  of  channels  can  be 
fit  into  small  areas  and  volumes.  Besides  the  area  and  volume 
advantages.  3-D  interconnects  permit  the  interstage  delay  to  be 
independent  of  the  number  of  channels.  Thus  in  contrast  to 
electronics,  the  overall  sorting  delay  grows  only  with  the  number  of 
stages.  Moreover  for  the  moderate  distances  present  in  these 
architectures,  the  optical  drive  power  is  independent  of  the 
communication  distance.  These  passive  optical  systems  also  have 
minimal  time  skew  and  may  communicate  information  at  optical  media 
bandwidths;  thus,  the  sorting  throughput  can  be  quite  large  and  is 
limited  in  practice  by  the  response  time  of  the  active  devices. 
Finally  due  to  the  prevalence  of  optical  technology  in  mass  storage 
and  communication  environments,  the  data  to  be  sorted  may  already  be 
in  optical  form.  In  conclusion,  optical  technology  will  be 
competitive  for  sorting  applications  with  large  numbers  of  channels 
and/or  high  bandwidth  signals. 

The  limiting  feature  of  optical  multistage  sorting  networks,  in 
contrast  to  electronic  implementations,  is  not  the  passive 
interconneotion  network,  but  the  active  processing  performed  in 
parallel  between  each  communication  step.  The  advantages  of  optical 
interconnections  for  sorting  that  we  outlined  above  are  dependent  cn 
the  existence  of  optically  compatible  2-channel  sorting  elements,  the 


Condition  1)  if  high  >=  low  then  high  ->  high  and  low  ->  low 


Condition  2)  if  high  <  low  then  high  ->  low  and  low  ->  high 


Fig.  1  Compare  4  Exchange  Rules 


Figure  1.  Compare-and-Exchange  Module. 
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ANALOG  IMPLEMENTATION  OF  COMPARE  AND  EXCHANGE  MODULE: 


Analog  implementations  of  C&E  have  been  proposed  for  associative 
memories  and  self -organizing  systems  is  .  In  this  application,  their 
role  is  to  identify  the  element  of  a  vector  with  the  maximum  value 
using  a  binary-tree  architecture.  Knowledge-based  systems  that  do  not 
involve  high  accuracy  data  could  potentially  use  analog  C&E  and 
related  operations  for  sorting  and  logical  set  operations. 
Unfortunately,  analog  sorting  systems  require  system  dynamic  range 
much  larger  than  the  dynamic  range  of  inputs.  For  the  moment,  let  us 
assume  that  we  desire  error-free  sorting  by  multistage  analog  C&E. 
Then  the  finite  accuracy  of  each  analog  comparison  calculation  in  the 
first  stage  places  an  initial  upper  bound  on  the  allowable  dynamic 
range  of  the  inputs.  In  addition,  multistage  analog  systems  lacking 
signal  restoration  accumulate  noise  which  further  limits  the  useful 
dynamic  range.  Hence,  noise  introduced  by  non-uniform  gain  and 
crosstalk  during  or  between  the  C&E  processes  is  the  most  serious 
limiting  factor  since  it  will  increase  with  each  stage  of  the 
calculation.  Clearly,  the  lowest  signal-to-noise  ratio  is  present  at 
the  last  stage  of  the  calculation  and  places  the  tightest  restrictions 
on  the  allowable  dynamic  range  of  the  inputs.  If  a  specific  dynamic 
range  is  desired  for  the  inputs  then  the  noise  introduced  by  the 
system  limits  the  number  of  possible  stages.  Since  the  number  of 
stages  is  the  logarithm  or  the  logarithm  squared  of  the  number  of 
inputs  in  deterministic  sorting  and  selection  networks,  respectively. 


noise  also  limits  the  number  of  data  channels.  Because  of  these 
apparent  problems  with  analog  approaches,  we  now  turn  to  digital 
implementations  of  CAE . 

DIGITAL  IMPLEMENTATIONS  OF  COMPARE -AND -EXCHANGE  MODULE: 

Digital  implementations  of  CAE  have  several  advantages  over 
analog  approaches . i 7  A  digital  representation  of  data  permits  any 
finite  dynamic  range  for  the  input  values  by  simply  specifying  the 
number  of  bits.  In  addition  digital  logic  can  restore  signal  levels, 
and  hence,  the  CAE  units  can  be  cascaded  indefinitely.  If  crosstalk 
noise  in  the  network  is  low  and  independent  of  the  number  of  data 
channels,  indefinite  cascadability  implies  that  the  number  of  data 
channels  is  limited  only  by  device-physics  constraints  like  space  and 
power  In  contrast  to  analog  implementations,  for  bit-serial  data  the 
internal  complexity  of  the  digital  CAE  modules  remains  constant 
regardless  of  the  network  size  and  the  dynamic  range  of  the  inputs. 
Moreover,  the  low  fan-in  and  -out  of  the  bitonic  network  compensate 
for  the  low  contrast  and  gain,  respectively,  in  the  active  optical 
devices.  Thus,  the  device  requirements  of  the  digital  approach  to 
multistage  sorting  networks  appear  to  be  compatible  with  the 
characteristics  of  optical  and  electrooptical  technology.  In  this 
section  we  will  show  that  CAE  has  simple,  hardwired  implementations 
that  benefit  from  the  bistable  nature  of  many  optical  devices. 


In  a  digital  C&E  module  the  comparison  operation  can  be 
considered  as  a  search  for  the  most  significant  bit  mismatch  between 
me  binary  representations  of  the  input  data.  The  input  word  mismatch 
rccurrme  closest  to  the  most  significant  bit  determines  which  datum 
is  larger.  and  thus,  the  switch  configuration.  A  rough  outline  of  a 
serial  algorithm  for  digital  C&E  schematically  shown  in  Figure  1  is  as 

roll CWS : 

1)  input  the  data  streams  A  and  B  most  significant  bit  first  into  the 
high/low  channels  of  the  C&E  module, 

2)  compare  the  two  channels  bit  by  bit;  at  the  first  occurrence  of  a 
mismatch  between  the  strings,  proceed  to  step  3  ; 

3i  if  the  mismatch  is  such  that  the  A  channel  contains  the  larger 
datum,  place  the  switch  in  the  barred  configuration  (i.e.  non-exchange 
position),  otherwise  the  B  channel  contains  the  larger  datum  and  place 
the  switch  in  the  crossed  configuration  (i.e.  exchange  position); 

The  time  evolution  of  the  switch  position  is  illustrated  in 
Figure  3  for  typical  input  streams .  The  input  data  streams  can  be 
routed  by  the  exchange  switch  subsequent  to  or  even  concurrent  with 
comparison  since  up  until  the  first  mismatch  the  data  streams  are 
identical.  Once  the  most  significant  mismatch  has  been  detected  and 
the  exchange  switch  configuration  determined,  for  correct  operation 
the  exchange  switch  must  be  insensitive  to  any  subsequent  mismatches. 
This  property  can  be  achieved  in  a  number  of  ways,  the  most  common 
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Figure  3.  Time  Evolution  of  Digital  Compare  and  Exchange. 


being  feedback  of  the  previous  result  of  the  comparison  operation. 
Likewise,  external  flip-flops  can  record  whether  or  not  the  mismatch 
has  occurred  and  in  which  direction  the  switch  should  be  set.  Hence 
at  the  beginning  of  each  word  comparison  the  feedback  signal  or  flip- 
flops  must  be  reset  to  signify  the  mismatch  has  yet  to  occur. 

LATCHING  LOGIC  DESIGN  OF  COMPARISON  UNIT 

Setting  the  exchange  switch  in  a  particular  configuration  until 
reset  consists  of  remembering  whether  or  not  a  mismatch  has  occurred 
and  into  which  state,  crossed  or  barred,  the  switch  should  be  fixed. 
However,  to  reduce  the  complexity  of  the  circuit  we  propose  making  the 
memory  function  inherent  to  the  logic  devices  that  perform  comparison. 
This  can  be  accomplished  by  using  the  bistability  present  in  nonlinear 
logic  devices  with  internal  feedback.  For  example  the  transfer 
function  of  a  latching  AND  gate  is  shown  in  Figure  4(a).  While 
operating  in  latching  mode,  the  device  is  biased  up  into  the  bistable 
loop.  When  the  AND  condition  is  first  met,  the  state  of  the  switch 
shifts  to  the  upper  part  of  the  bistable  loop.  Since  subsequent 
removal  of  all  the  inputs  except  for  the  bias  does  not  change  the 
output  of  the  gate,  the  gate  is  effectively  latched  into  the  logical 
state  true;  Previously  the  only  proposed  uses  for  bistability  in 
optical  computers  was  for  delay  lines  or  memory  elements . 1  ® •  1  9  By 
extending  the  techniques  we  have  presented  here,  it  can  be  shown  that 
latching  NAND,  OR,  and  NOR  gates  are  possible  with  appropriate 
bistable  loops  and  bias  levels. 
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Figure  4(a).  Transfer  Function  of  a  Latching  and  Gate. 
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Figure  4(b).  Latching  and  Implementation  of  Compare  Operation. 


Once  one  understands  how  latching  logic  works,  the  next  task  is 
to  build  latching  logic  circuits  that  perform  a  useful  function. 
Unfortunately,  we  do  not  know  of  any  general  methods  to  design 


latching  logic  circuits  based  on  a  description  of  the  intended  circuit 
function.  However,  we  succeeded  in  designing  the  latching  AND  circuit 
for  comparison  shown  in  Figure  4(b)  that  operates  in  the  following 
manner.  The  top  and  bottom  first  layer  AND  gates  are  designed  to 
latch  at  the  first  occurrence  of  the  mismatches  ( Ai  >  Bi  )  and 
( Bi  >  Ai  )  ,  respectively.  If  ( Ai  >  Bi  )  occurs  before  ( Bi  >  Ai  )  then 
the  top  gate  latches  to  true  while  the  output  of  the  bottom  gate  is 
unlatched  at  false.  The  latched  output  of  the  top  gate  is  then 
inverted  to  false,  preventing  the  second  layer  gate  from  ever  latching 
to  the  state  (exchange  =  true),  regardless  of  changes  in  the  state  of 
the  bottom  gate.  Thus  the  output  of  the  comparison  module  is 
effectively  latched  to  the  state  (exchange  =  false).  Conversely,  if 
( Bi  >  Ai  )  occurs  before  (Ai  >  Bi  )  the  bottom  gate  latches  to  true 
while  the  top  gate  remains  unlatched  at  false.  Thus  the  second  layer 
gate  and  output  of  the  comparison  modules  is  directly  latched  to  the 
state  (exchange  =  true).  Since  removal  of  the  bias  causes  all  the 
latching  gates  to  relax  to  the  false  state,  the  complement  of  the 
inter-word,  reset  signal  should  be  used  as  the  bias  to  all  of  the 
latching  gates.  Since  each  gate  latches  at  most  once  per  word,  the 
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switching  duty  cycle--and  hence  power  dissipation  in  the  comparison 
module--decreases  with  increasing  word  length.  Other  latching  logic 
families,  like  those  based  on  latching  NAND  gates,  form  the  basis  of 
alternative  comparison  circuits. 20 

OPTICAL  IMPLEMENTATIONS  OF  LATCHING  LOGIC  FOR  COMPARISON 

Latching  logic  gates  can  be  fabricated  using  a  variety  of  optical 
logic  technologies.  Each  device  technology  has  associated  with  it  a 
set  of  performance  characteristics  that  are  crucial  to  the  selection 
of  the  relevant  application  domain.  Among  the  critical 
characteristics  are  switching  speed,  power,  wavelength  of  operation, 
size  and  technological  maturity.  In  this  section,  we  will  highlight  a 
few  of  these  device  technologies  and  show  how  their  individual 
characteristics  limit  their  intended  applications. 

Bistable  Fabry-Perot  etalons  can  implement  latching  AND  gates  for 
the  compare  operation.  The  latching  circuit  we  outlined  above 
tolerates  the  low  gain  and  contrast  of  etalons  because  the  fan-out  and 
-in  required  of  the  latching  gates  is  at  most  one  and  three, 
respectively.  Because  of  their  high  speed  21  nonlinear  etalons  are 
well  suited  for  switching  broadband  signals.  The  speed  of  the 
latching  circuits  based  on  etalons  may  be  limited  by  the  cavity  build¬ 
up  time  required  to  reach  the  stable  state.  A  device  with  a 
nonsymmetric  cycle  time  22  (fast  switch-on  and  slow  switch-off)  is 
useful  if  the  packet  frequency  is  small  compared  to  the  bit  frequency 
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Since  at  a  fixed  bandwidth  the  power  dissipation  in  each  comparison 
module  decreases  with  increasing  header  length,  large  networks  based 
on  etalon  comparison  may  be  feasible.  However  as  we  shall  see  in  the 
next  section,  large  networks  demand  signal  restoration  whose  total 
power  dissipation  grows  with  the  bandwidth  and  network  size.  Anyhow, 
at  a  fixed  bandwidth  the  power  dissipation  in  the  comparison  module 
decreases  with  decreasing  packet  frequency.  This  is  especially  useful 
for  applications  that  generate  very  long  packets  relatively 
infrequently  such  as  inter-computer  communications  and  video 
telecommunication . 

Slightly  slower  speeds  for  comparison  are  possible  with  symmetric 
SEED  devices  serving  as  the  latching  AND  gates. 23  Just  as  with 
bistable  etalons,  the  latching  SEED  devices  must  wait  for  the  positive 
feedback  (in  this  case  electrical)  to  build  up  to  place  the  output  in 
a  stable  state.  Comparison  units  based  on  SEED  devices  offer  a 
variable  speed/power  tradeoff:  24  therefore,  higher  levels  of 
integration  may  be  possible  for  low  bandwidth  signals  before  thermal 
dissipation  becomes  a  problem.  Thus,  SEED  arrays  appear  well  suited 
for  subscriber  loop  and  intra-computer  communications  where  the  data 
rates  are  relatively  lower  but  the  number  of  channels  is  higher  than 
the  previous  applications. 

Bistable  laser  diodes  also  possess  the  necessary  properties  to 
implement  latching  logic.  The  high  gain  and  contrast  of  laser  diodes 
make  them  particularly  well  suited  for  environments  where  the 
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interstage  connections  create  considerable  losses  and  crosstalk.  In 
addition,  laser  diode  manufacturing  technology  has  demonstrated  its 
maturity,  single  mode  fiber  compatiblity  and  ability  to  form  2-D 
arrays  of  devices.  However  the  physical  size  and  total  power 
dissipation  of  the  devices  can  be  quite  large,  preventing  their 
incorporation  into  very  large  integrated  structures.  Thus,  they  seem 
best  suited  for  the  trunk  and  inter-computer  communications 
applications  which  involve  a  small  number  of  high  bandwidth  channels. 

There  are  alternatives  for  comparison  implementation  that  are 
based  on  hybrid  logic  device  designs.  These  designs  detect  the 
incoming  light  signals  and  then  modulate  one  of  the  input  signals  or  a 
bias  signal  to  produce  the  desired  logic  operations . 2 5  -  27  The  primary 
advantage  of  this  approach  is  that  sophisticated  electronic 
processing  can  be  performed  on  the  detected  signal  before  it  is 
applied  to  the  modulator.  For  instance,  complex  switching  nodes  for 
store-and-f orward  packet  switches  may  be  attainable.  The  use  of 
special  modulating  materials  with  intrinsic  memory  characteristics 
such  as  the  Ferroelectric  Liquid  Crystals  and  other  inorganic 
ferroelectric  electrooptic  materials  will  lead  to  latching  logic 
devices.  This  technology  is,  however,  immature  and  most  device 
response  tines  are  on  the  order  of  milliseconds,  making  them  too  slow 
for  the  applications  under  consideration.  New  developments  in 
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materials  research  to  enhance  the  response  time  and  successful 
incorporation  of  fast  materials  into  functional  devices  will 
necessarily  lead  to  a  re-evaluation  of  this  technology. 

OPTICAL  IMPLEMENTATION  OF  THE  EXCHANGE  UNIT 

Spatial  position  encoded  exchange  units  built  with  conventional 
non-latching  logic  can  restore  signal  levels.  Thus  only  signal  to 
noise,  crosstalk,  uniformity,  power  and  other  systems  engineering 
considerations  limit  the  number  of  channels  per  stage  and  the  total 
number  of  stages.  Because  noise  does  not  propagate  between  stages, 
restoring  exchange  applies  to  deep  networks.  The  schematic  circuit 
diagram  of  the  exchange  unit  is  shown  in  Figure  5.  The  AND  operation 
can  be  performed  by  any  nonlinear  optical  device  (all-optical  or 
hybrid)  with  a  sigmoidal  input-output  response  and  proper  biasing. 
The  two  OR  gates  in  the  second  stage  receive  signals  that  mutually 
exclusive,  and  hence  can  be  implemented  by  passive  combiners.  As  the 
logic  expressions  in  Figure  5  indicate,  the  output  H  will  be 
equivalent  to  A  (and  output  L  equivalent  to  B)  if  the  exchange  signal 
E  is  O',  and  the  signals  at  the  output  port  will  be  interchanged  if  E 
is  "1".  Like  the  technology  that  is  available  for  comparison,  the 
networks  based  on  restoring  exchange  devices  span  the  spectrum  from 
narrowband  and  large  to  broadband  and  small. 


The  potential  throughput  per  channel  is  greatly  increased  if  the 
exchange  module  uses  passive  switches.  In  this  case,  the  bandwidth  of 
the  message  header  is  determined  by  the  response  time  of  the 
comparison  logic  while  the  trailing  information  can  propagate  at 
optical  media  bandwidths  within  the  signal-to-noise  limits  imposed  by 
losses  in  the  passive  switches.  Polarization  encoded  switching  using 
Wolloston  prisms  and  controllable  half-wave  plates9  is  one  technology 
that  performs  passive  routing.  A  photoactivated ,  polarization  encoded 
exchange  unit  is  shown  in  Figure  6.  A  photodiode,  photoconductor  or 
phototransistor  receives  the  exchange  signal  and  produces  an  electric 
field  dependent  change  in  the  polarizability  of  the  dynamic  half-wave 
plate  through  the  electrooptic  effect.  When  activated,  the  dynamic 
half-wave  plate  rotates  the  polarization  of  the  orthogonally  polarized 
signal  beams  through  90°,  thereby  acting  as  a  passive  switch.  The 
Wolloston  prism  or  polarizing  beamsplitter  subsequently  separates  the 
high  and  low  channels.  The  advantage  of  polarization  switching,  in 
addition  to  its  passive  nature,  is  that  exchange  occurs  in  one  stage 
and  the  data  may  occupy  the  same  spatial  channel.  Similarly,  the  data 
can  be  wavelength  multiplexed  for  further  increases  in  bandwidth.  In 
addition,  the  fan-out  of  the  previous  comparison  module  only  has  to  be 
one.  Howmrer,  the  frame  rate  of  optically  controlled,  dynamic  half¬ 
wave  device  arrays  is  presently  constrained  to  the  millisecond  regime 
by  the  material  characteristics  and  the  combined  optical  and 
electrical  switching  power  dissipation  limitations.  Since  exchange 
based  on  polarizat ion-tagging  is  non-regenerative ,  the  the  number  os 


stages  and  thereby  the  number  of  inputs  is  primarily  limited  by 
absorptive,  diffractive,  sampling  and  scattering  losses.  Thus,  they 
are  limited  to  small  networks  with  low  packet  frequency  but  high  data 
rates  like  inter-computer  communications  or  video  telecommunications. 

SUMMARY  AND  CONCLUSIONS 

In  this  paper  we  reviewed  why  optical  interconnects  are 
appropriate  to  implement  pipelined  sorting  networks  for 
telecommunication  and  parallel-processing  applications.  We  went  on  to 
propose  optical  implementations  of  the  active  compare-and-exchange 
operation  that  are  essential  to  the  sorting  networks.  In  particular, 
we  described  a  class  of  Boolean  logic  devices  called  latching  logic 
which  permits  the  design  of  simple,  hardwired  comparison  modules. 
Latching  logic  significantly  reduces  the  interconnect  and  gate 
complexity  of  the  compare  module  over  the  non-latching  logic  approach. 
Based  on  the  available  device  characteristics  we  outlined  the 
application  domains  of  sorters  utiliizing  a  variety  of  optical 
technology.  Which  technology  one  chooses  depends  on  the  requirements 
of  the  application  of  interest.  One  application  where  optics  will 
compete  most  favorably  with  electronics  is  when  the  packets  are  long 
and  infrequent,  and  where  low  delay  and  high  throughput  are  paramount - 
-for  example  video  telecommunications  and  inter-processor  message 
routing.  Optics  also  appears  competitive  at  the  other  extreme  of 
intra-processor  and  subscriber-loop  communication  where  the  signals 
are  much  slower  but  involve  very  large  numbers  of  channels. 
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All-Optical  Compare-and-Exchange  Switches 

Lei  Zhang,  Ruxiang  Jin,  C.W.  Stirk,  G.  Khitrova,  R.A.  Athale, 
H.M.  Gibbs,  H.M.  Chou.  R.W.  Sprague,  and  H.A.  Macleod 


Abstract — All-optical  compare  and  exchange  is  experimentally  demonstrated  using 
ZnS  bistable  optical  devices.  The  compare-and-exchange  demonstration  utilizes 
polarization  multiplexing  and  filtering,  and  latching  and  bidirectional  logic.  The 
combination  of  2-D  arrays  of  compare-and-exchange  modules  with  optical  perfect- 
shuffle  interconnections  leads  to  pipelined  optical  sorting  networks  that  can  process 
large  numbers  of  high-bandwidth  signals  in  parallel.  Optical  sorting  networks  with 
these  characteristics  are  applicable  in  telecommunication  switches,  parallel  processor 
interconnections  and  database  machines. 


The  Arizona  portion  of  this  research  was  supported  by  DARPA/RADC.  SDIO  and  OCC.  The  BDM  portion 
was  funded  by  DARPA/AFOSR  under  Contract  Number  F49620-86-C-0030. 

Lei  Zhang,  was  a  visiting  student  at  Optical  Sciences  Center.  University  of  Arizona.  Tucson.  AZ  85721.  from 
the  Department  of  Applied  Physics.  Harbin  Institute  of  Technology,  Harbin.  People's  Republic  of  China. 
Ruxiang  Jin.  G.  Khitrova.  H.M.  Gibbs.  H.M.  Chou,  R.W.  Sprague,  and  H.A.  Macleod  are  with  Optical 
Sciences  Center.  University  of  Arizona  Tucson,  AZ  85721 

C.W.  Stirk,  and  R.A.  Athale  are  with  The  BDM  Corporation.  79t5  Jones  Branch  Drive.  McLean.  Virginia 


22102-3396 


vwv 


¥ 


wnn'i'iii' 


1 


p  .'• 


U 


'•■'.•■ 
I.*  % 


H 


t? 

I 


I 


-  2  - 


I.  Introduction 

Sorting  is  one  of  the  most  common  and  well-understood  topics  in  computer 
science.  It  is  known  that  serial  sorting  algorithms  require  at  least  O(NlogN)  temporal 
complexity  [1].  Hardware  based  on  parallel  sorting  algorithms  offers  enhanced 
performance  on  problems  that  must  rapidly  sort  large  quantities  of  information.  Since 
the  number  of  clock  cycles,  devices  and  interconnects  are  limited  resources  in  any 
processing  environment,  we  need  parallel  algorithms  with  sublinear  temporal  and 
practical  spatial  complexity.  In  addition,  the  algorithms  we  choose  must  be  optimum 
with  respect  to  our  specific  implementation  technology.  For  instance  the  mesh 
algorithms  developed  for  VLSI  require  only  nearest  neighbor  connections  and  are 
sublinear  0(N1/2)  in  temporal  complexity  [2].  Unfortunately,  mesh  algorithms  must 
finish  sorting  one  sequence  before  beginning  another;  thus  their  throughput  is  limited 
by  their  latency.  On  the  other  hand,  the  shared  memory  [3]  and  some  network  [4] 
algorithms  have  the  lowest  temporal  complexity  O(logN)  of  all  sorting  algorithms,  but 
are  not  practical  with  current  technology  since  they  require  globally  reconfigurable 
interconnects  and  excessive  spatial  resources,  respectively. 

Network  algorithms  based  on  the  bitonic  sort  [5]  have  sublinear  temporal 
complexity  C(Iog2N).  Moreover,  they  can  be  pipelined  in  stages  for  high  throughput; 
and  thus,  are  useful  in  problems  where  throughput  is  as  critical  as  latency.  But  the 
bitonic  sorting  network  requires  at  least  one  globally-connected  interstage  communication 
pattern.  For  instance  the  perfect-shuffle  [6]  connection  pattern  transmits  half  the 
information  present  in  the  top  half  of  a  list  to  the  bottom  half  and  vice  versa. 
Because  VLSI  is  confined  to  the  2-D  surface  of  a  chip  and  electrons  in  wires  are 
capacitively  coupled,  practical  electronic  perfect-shuffles  are  limited  to  small  numbers 
of  channels  and  low  data  rates.  In  contrast,  the  noninteracting  nature  of  photons  and 
3-D  connection  capability  of  optics  allows  optical  perfect-shuffle  networks  to  have 
large  numbers  of  parallel  channels  and  high  data  rates  [7]-[9].  Thus,  optical  sorting 
networks  based  on  the  perfect-shuffle  interconnection  and  bitonic  algorithm  are 
desirable  when  the  number  of  communication  channels  or  the  data  rates  exceed  the 
capabilities  of  electronic  systems. 

In  particular,  optical  sorting  networks  are  applicable  in  telecommunication 
switches  that  route  high-bandwidth  optical  data  packets  [10].  Telecommunication 
switches  must  handle  many  parallel  channels,  have  low  latency  and  keep  up  with  the 
packet  generation  rates.  Similarly,  high-throughput  sorters  serve  as  the  communication 
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fabric  of  electronic  multiprocessors  [11].  In  these  parallel  processors  the  number  of 
processing  elements,  and  thereby  the  computational  power,  is  governed  by  the  number 
of  parallel  data  channels.  Furthermore,  the  throughput  of  each  processing  element  is 
limited  by  the  interconnection  latency  and  throughput.  In  addition,  sorting  hardware 
may  serve  as  dedicated  subsystems  for  parallel  database  operations  [12]  in 
conjunction  with  optical  memories  [13].  Parallel  and  independent  memory  access  can 
generate  data  rates  beyond  the  capabilities  of  electronic  systems. 

Network  sorting  algorithms  need,  in  addition  to  perfect-shuffle  interconnections, 
2x2  self-routing  crossbar  switches  where  each  routing  decision  depends  on  the 
relative  magnitude  of  the  local  information.  Hence,  we  desire  implementations  of  the 
2x2  self-routing  crossbars  that  are  compatible  with  optical  interstage  connections  and 
fulfill  the  requirements  of  bandwidth  and  parallelism  in  sorting  applications.  The 
function  of  such  self-routing  crossbars  can  be  separated  into  the  operations  of 
comparison  and  exchange:  comparison  determines  the  relative  magnitude  of  the  local 
data;  exchange  configures  the  crossbar  switch  dependent  on  the  outcome  of  the 
comparison. 

In  all  subsequent  discussions  we  assume  a  binary  representation  for  the  data. 
Overscores  represent  the  invert  operation;  thus  R,  and  R2  are  the  logical 

complements  of  the  system  reset.  Brackets  "[  ]"  contain  a  latching  condition  which 
we  will  explain  shortly. 

An  algorithm  for  compare  and  exchange  proceeds  as  follows:  we  label  the 
synchronous  input  channels  A  and  B,  and  operate  serially  from  most  to  least 
significant  bit.  If  A,  >  Bj  occurs  before  Bj  >  Ajt  where  i  represents  the  bit  position, 
then  the  switch  latches  into  the  "don’t-exchange”  position.  Conversely  if  Bj  >  A; 
occurs  first  then  the  switch  latches  into  the  exchange  position  (Fig.  1).  Latching 
implies  that  once  an  inequality  has  been  detected,  the  exchange  switch  becomes  set 
into  one  particular  configuration  until  the  system  is  reset. 

Optical  bistable  devices  have  the  potential  for  high-speed  optical  signal 

processing  and  computing  [14]-[15].  ZnS  and  ZnSe  bistable  interference  filters  have 
already  been  used  to  demonstrate  simple  digital  optical  circuits,  pattern  recognition, 
symbolic  substitution,  and  one-bit  addition,  because  they  can  be  operated  in  the 

visible  spectrum  and  are  relatively  easy  to  fabricate  [I6]-[17],  In  this  paper  we 

experimentally  demonstrate  a  circuit  that  performs  compare  and  exchange  with  ZnS 
interference  filters  as  bistable  devices.  Here  the  ZnS  interference  filters  are  used  in 
less  common  modes  of  operation  including  latching  and  bidirectional  logic.  In  addition 


Figure  1.  The  compare-and-exchange  module.  E  represents  the  exchange 
signal.  A,  B  represent  the  two  input  numbers,  and  H,  L 
represent  the  higher  number  and  the  lower  number, 
respectively. 
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we  employ  polarization  multiplexing  and  filtering  to  achieve  channel  isolation.  4-port 
bidirectional  devices  and  reduced  feedback.  In  the  next  section  we  outline  the  design 
of  one  possible  compare-and-exchange  circuit  without  regard  to  the  implementation 
technology.  We  also  illustrate  the  expected  operation  of  each  part  of  the  circuit.  In 
the  following  section  we  present  the  general  layout  and  operation  of  the  compare- 
and-exchange  design  using  7.nS  interference  filters  along  with  polarization 
multiplexing  and  filtering.  In  the  discussion  section  we  compare  the  experimental  and 
expected  results.  We  conclude  with  some  general  comments. 

II.  Compare-and-Exchange  Circuit  Design 

More  than  one  circuit  design  is  possible  for  comparison  [18].  The  circuit 
diagram  for  the  comparison  circuit  demonstrated  in  this  paper  is  shown  in  Fig.2.  It 
consists  of  three  parts.  The  first  part  is  a  comparator  to  distinguish  between  the 
cases  where  A^B;  or  A,<Br  Fig.3  shows  how  this  can  be  done  by  generating  A,B, 
and  A,Bj.  In  the  second  and  third  parts,  two  latching  gates  are  employed,  so  that 
when  A<B,  AjBj-1  comes  first,  and  one  latching  gate  will  be  switched-on  to  give  an 
exchange  signal  E-l.  It  remains  in  the  on-state  until  all  the  bits  of  A  and  B  are 
transmitted.  Similarly,  when  A>B,  AjB,  comes  first,  and  another  latching  gate  will  be 

switched-on  to  prevent  the  exchange.  From  Fig.3  we  see  that  AjBj-1  and  AjBj-1 

never  occur  simultaneously,  making  it  possible  to  separate  the  state  of  the  latching 
gates. 

Like  comparison,  however,  there  is  more  than  one  way  to  implement  exchange. 
The  appropriate  choice  depends  on  the  application  requirements,  the  technology 
characteristics  and  the  corresponding  comparison  circuit.  For  demonstration  purposes 
we  will  construct  an  active  exchange  module.  If  the  exchange  signal  E-l  is  present, 
it  sends  B  to  the  H-output  and  A  to  the  L-output.  Otherwise  if  the  exchange  signal 

is  0,  it  sends  A  to  H-output  and  B  to  the  L-output. 

The  above  discussion  shows  that  our  circuit  design  needs  a  comparator,  two 

latching  gates,  and  an  exchanger.  Fig.4  shows  that  AjBj  and  AjB;  can  be  generated 

from  a  single  bistable  etalon  by  using  its  reflections  from  both  sides,  so  that  a  single 

bistable  device  can  be  used  as  a  comparator.  The  latching  operation  is  also  natural 

for  bistable  devices,  so  that  the  two  latching  gates  are  just  two  bistable  devices. 
Another  bistable  device  is  used  as  the  exchanger  with  its  transmission  determined  by 
the  exchange  signal.  The  details  of  their  operations  are  discussed  in  the  following 
section. 
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Figure  3, 


The  use  of  A-jB-jand  A-jB-j  to  compare  A  and  B.  _wher 
A<B,  A^B-j=l  will  appear  first.  When  A>B,  A^B^l 
will  appear  first 


Figure  4.  Generation  of  A-jB-jand  ft-jB-j  using  a  single 

bidirectional,  reflection-mode  Fabry-Perot  etalon. 


III.  Experimental  Demonstration 

The  experimental  layout  for  the  compare-and-exchange  circuit  is  shown  in  Fig.5. 
An  Argon  laser  and  a  phase  grating  generate  four  optical  beams,  each  having  a  peak 
power  of  about  40  mW.  A  chopper  modulates  beams  A  and  B  with  the  test  sets,  and 
blocks  the  holding  beams  Rt  and  R2  between  each  test  set  to  allow  the  latching  gates 
to  reset.  A  half-wave  plate  gives  the  two  holding  beams  R|  and  R2  s  polarization.  A 
quarter-wave  plate  gives  the  beams  A  and  B  circular  polarization. 

For  the  experimental  demonstration  of  all-optical  compare  and  exchange  we 
choose  test  vectors  of  A  -  110001.  B  -  101011  and  A  -  100011.  B  -  110101.  In 
the  former  set  of  test  vectors  A;  >  B;  occurs  first;  all  four  permutations  of  AjBj 
follow  to  ensure  that  the  switch  is  properly  latched.  Similarly,  for  the  latter  group  of 
test  vectors  we  find  that  B  >  A  and  demonstrate  the  exchange  stability  to  further 
permutations.  Up  until  the  first  mismatch  the  position  of  the  exchange  switch  is  not 
important  to  first-order  approximation  since  the  output  data  streams  are  identical.  In 
Fig.6  we  depict  the  expected  operation  of  the  latching  compare  and  passive  circuits 
described  above  for  both  test  sets.  All  data  used  in  the  simulations  are  based  on  the 
structure  of  each  filter.  The  curves  are  drawn  upside  down  to  be  consistent  with  the 
experimental  photographs.  We  see  that  whether  A>B  or  A<B.  the  larger  number 
always  goes  to  the  H-output.  We  did  not  fit  the  simulations  with  the  experimental 
results  because  we  wanted  to  show  the  ideal  results  with  suitable  devices.  The 
transfer  functions  shown  in  Fig.  7  used  the  same  data. 

The  compare  circuit  operates  in  the  following  manner.  The  circularly  polarized 
data  beams.  A  and  B,  are  incident  on  two  polarizing  beam  splitters  (PBS's).  These 
PBS's  serve  two  functions.  One  function  is  to  sample  the  data  beams  for  the  compare 
operation:  the  p-polarization  from  the  A  channel  propagates  through  the  PBS  for 
comparison,  the  s-polarization  is  reflected  to  the  exchange  switch;  conversely,  the  s- 
polarization  of  the  B  beam  is  reflected  for  comparison  and  the  p-polarization 
propagates  through  the  PBS  to  the  exchange  switch  while  its  polarization  is  rotated 
by  the  half-wave  plate  to  match  that  of  A.  The  orthogonally  polarized  data  beams 
that  were  injected  into  the  compare  circuit  are  converted  to  circular  polarization  by 
two  quarter-wave  plates.  The  circularly  polarized  data  beams  are  incident  on  the 
first  interference  filter  (IF | ). 

IF,  operates  in  reflection  mode  as  a  bidirectional  comparator.  If  Bj  is  zero  and 
if  A,  is  one,  then  the  circularly  polarized  A;  is  reflected  by  IFj.  converted  to  the  s- 
polarization  by  the  quarter-wave  plate  and  reflected  by  the  PBS  to  produce  the  signal 


Figure  5.  Experimental  layout  for  all-optical  compare  and  exchange  with 

IF-  ZnS  interference  filter,  A/2-  half-wave  plate,  A/4-  quarter- 
wave  plate.  E  represents  the  exchange  signal,  A,  B  represent 
the  two  binary  encoded  numbers,  and  H,  L  represent  the  outputs 
of  the  larger  number  and  the  smaller  number,  respectively. 


uter  simulated  transfer  functions  for  the  ideal  inte 
nee  filters.  The  horizontal  axes  are  input  power  in 
trary  units  and  the  vertical  axes  are  reflectivity  o 
smissivity  of  the  filters,  (a).  Reflectivity  Rj  of 


A,Bj.  In  a  similar  fashion  if  A,  is  zero  and  Bj  is  one.  then  the  circularly  polarized  B, 
is  reflected  by  IF^  converted  to  the  p-polarization  by  the  quarter-wave  plate  and 
transmitted  through  the  PBS  to  produce  the  signal  AjBj.  Thus  the  PBS’s  also  function 
as  part  of  a  bidirectional  switch.  Since  the  filter  inputs  are  Aj  and  Bj  and  the 
outputs  are  both  AjBj  and  A,Bj.  IF!  is  a  4-port  device. 

[R|AB  ]  is  the  exchange  prohibited  signal  from  IF2.  which  works  in  reflection 
mode  as  a  latching  NAND  gate  (See  Fig. 7b).  As  long  as  AjBj  -  0.  the  reflection  of 
Ri  is  high,  which  has  been  polarization  rotated  so  it  passes  through  PBS  helping  IF3 
to  switch  on  and  latch  to  have  a  high  transmission  when  AjBj  becomes  1  (See  Fig. 7c). 
This  is  the  exchange  situation  with  E  -  1.  If  AjBj  becomes  1.  IF2  switches  on  and 
latches  to  have  a  low  reflectivity;  the  reflection  of  R|  will  be  low  thereafter.  If  this 
takes  place  before  the  first  occurrence  of  AjBj  equal  to  1.  IF3  will  never  have  enough 
input  power  to  switch  on.  and  the  output  of  exchange  signal  E  will  always  be  low 
(See  Fig.  10). 

The  final  filter  IF4  is  the  exchanger  and  works  in  both  transmission  and 
reflection  modes  (See  Figs.7d-7f).  If  both  Aj  and  Bj  are  0.  both  outputs  are  0 
independent  of  the  exchange  signal.  In  the  case  that  E  -  0  and  only  one  of  A;  and  Bj 
is  1,  IF4  will  not  switch  on;  beam  Aj  reflects  to  the  high  output  on  the  right;  beam 
Bj  reflects  to  the  low  output  on  the  left.  If  both  Aj  and  Bj  are  1,  IF4  switches  on  and 
has  a  high  transmissivity  and  low  reflectivity.  Both  sides  have  a  high  transmission 
independent  of  the  exchange  status.  The  exchange  control  signal  E  will  move  the 
transmission  curve  closer  to  the  laser  frequency.  When  E  -  1.  either  signal  (or  both) 
can  switch  on  the  gate;  then  Aj  and  B;  are  transmitted  to  the  opposite  sides,  in  other 
words  they  are  exchanged. 

Fig.8  shows  the  results  of  the  comparator.  It  demonstrates  clearly  that  as  soon 
as  there  is  a  difference  between  Aj  and  Bj,  the  comparator  has  a  high  output  to  the 
following  corresponding  gate  which  makes  the  appropriate  decision.  At  the  first  bit. 
numbers  Aj  and  B;  are  equal.  The  output  goes  from  a  high  reflection  rapidly  to  a 
low  reflection  and  produces  a  sharp  peak  pulse  at  the  rising  edge  of  the  output.  If 
the  signal  pulse  width  is  large  enough  compared  to  the  width  of  the  sharp  pulse,  this 
sharp  pulse  will  not  have  enough  power  and  would  not  switch  the  next  stage.  Before 
each  comparison  of  the  input  numbers.  R,  and  R2  are  reset  to  1.  When  the  compare 
and  exchange  is  over.  R|  and  R2  are  shut  off.  The  system  is  waiting  for  the  next 
operation.  Fig.9  shows  the  exchange  prohibited  signal  [R,AB  ].  Upon  the  first 
occurrence  of  Aj>Bj.  this  signal  latches  to  a  low  output.  Fig.  10  shows  the  exchange 
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Figure  8.  Experimental  results  of  the  inputs  A,  B  and  the  logic  outputs 
AB,  AB.  The  input  powers  are  11  mW  each,  and  the 
output  power  is  about  5  mW. 


Figure  9.  Output  of  the  exchange-prohibited  signal  RjAB. 

The  upper  two  traces  show  the  two  groups  of  numbers  coming  into 
the  system.  In  the  first  group  A  is  larger  than  B  and  in  the 
second  one,  B  is  larger  than  A.  The  power  of  the  holding  beam 
is  19  mW,  and  the  output  power  is  about  6  mW. 
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signal  E  which  is  the  transmission  of  R2.  On  the  left  part  of  Fig.  10,  although  two 
cases  of  B^A;  occur,  E  remains  in  its  low  state  because  the  earlier  occurrence  of 
A;>Bj  latched  the  exchange  prohibited  signal  to  0.  Oa  the  right  part  of  Fig.  10,  A  is 
larger  than  B.  Upon  the  first  occurrence  of  ApB;,  E  is  latched  to  1.  and  therefore  all 
the  following  bits  exchange  their  positions.  The  time  delay  at  the  rising  edge  of  E  is 
caused  by  the  switching  speed  of  the  device.  Figs.  11  and  12  show  the  results  of 
compare  and  exchange  in  the  two  cases  of  A>B  and  A<B,  respectively. 

IV.  Discussion 

From  the  experimental  results  of  the  all-optical  compare-and-exchange  circuit 
one  can  see  that  the  contrasts  are  not  as  good  as  those  in  the  simulations.  This  is 
because  the  filters  used  in  the  experiment  were  not  specially  designed  for  reflection¬ 
mode  operation.  Therefore  the  low  state  of  the  reflection  is  higher  than  we  expected. 
However  even  with  such  non-optimal  filters,  the  system  worked.  The  contrasts  of  the 
outputs  could  be  better  by  using  specially  designed  reflection-mode  filters.  This 
would  also  decrease  the  power  required.  Another  bistable  optical  device,  with  its 
threshold  set  half  way  between  the  worst  case  levels  0  and  1,  could  amplify  the 
outputs  of  the  exchanger  as  well  as  enhance  the  contrast.  Then  the  outputs  could  be 
used  to  drive  the  next  compare-and-exchange  module  in  a  self-routing  optical 
network. 

It  was  not  easy  to  obtain  stable  operations  of  all  of  the  four  interference  filters 
simultaneously  long  enough  to  test  the  system,  especially  since  the  contrasts  of  the 
devices  are  not  so  good.  The  data  in  Figs.8-12  were  taken  with  only  the  relevant 
section  working.  Figs.8-10  Were  taken  from  the  compare  unit  consisting  of  IF|-IF3. 
And  Figs.  11-12  were  the  results  from  the  exchange  unit  of  IF4.  While  we  were  doing 
the  exchange,  we  used  a  third  beam  having  the  power  consistent  with  the  exchange 
control  signal  E.  Therefore,  the  experimental  results  do  not  show  the  delay  as  seen  in 
the  simulations.  We  also  did  the  experiment  with  IF(  and  IF3  producing  a  real 
exchange  signal  for  the  last  gate  to  show  that  when  A<B,  the  exchanger  works. 

It  was  also  not  easy  to  focus  A,  B  and  E  onto  IF*  and  have  H  and  L  come  out 
without  energy  losses  when  the  respective  polarization  directions  are  considered.  A 
45°  Faraday  rotation  glass  and  a  half-wave  plate  placed  between  the  exchange  control 
signal  E  and  IF*  might  solve  this  problem.  A  plane-polarized  light  beam  passing 
through  the  glass  will  have  its  polarization  direction  rotated  through  an  angle  d 
relative  to  the  polarization  direction  of  the  incident  beam.  A  beam  coming  from  the 
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Figure  10.  Output  of  the  exchange  signal  E.  The  holding  power  is  20  mW, 
and  the  output  power  is  5  mW. 


Figure  11. 


The  high  output  and  the  low  output  of  the  system  with  A>B. 
input  power  is  14.5  mW,  and  the  output  power  is  6.5  mW. 
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Figure  12. 


The  high  output  and  the  low  output  of  the  system  with  B>A. 
input  power  is  14.5  mW,  and  the  output  power  is  6.5  mW. 
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opposite  direction  will  have  its  polarization  rotated  in  the  same  direction.  Let  9  be 
45°  and  the  fast  axis  of  the  half-wave  plate  be  67.5°  with  respect  to  the  incident 
plane  of  vibration  of  E.  Then  the  polarization  direction  of  E  propagating  to  the  right 
will  be  rotated  90°.  while  that  of  L  which  propagates  to  the  left  will  be  unchanged. 
A  Faraday  rotator  as  shown  in  the  experimental  layout  was  not  available  during  the 
experiment.  Instead,  only  a  half-wave  plate  was  used  to  rotate  the  plane  of  the 
vibration  of  E  by  90°  and  make  E  go  to  IF4,  so  that  the  system  could  work.  By 
slightly  detuning  the  half-wave  plate,  a  small  transmission  of  L  is  detected. 

The  power  of  the  exchange  control  signal  E  in  our  experiment  is  small 
compared  to  those  of  numbers  A  and  B.  So  the  exchange  operation  in  our  case  has 
to  be  active.  That  is  when  the  exchanger  is  set  to  exchange  status,  it  has  to  be 
switched  on  for  each  following  bit.  The  operation  speed  is  then  limited  by  this 
switching  speed.  However,  if  E  could  be  twice  as  large  as  A  and  B,  the  exchanger 
could  operate  as  follows:  While  E  -  0.  IF4  can  not  be  switched  on.  and  always  keeps 
a  high  reflectivity,  so  that  the  input  signals  are  reflected  back,  the  exchanger  operates 
like  a  mirror.  While  E  -  1,  IF4  switches  on  and  keeps  a  high  transmissivity,  so  that 
the  input  signals  are  transmitted  to  the  opposite  sides  and  exchanged.  The  advantage 
is  that  after  the  exchanger  is  set,  the  following  data  could  have  a  extremely  high¬ 
speed  transmission,  since  everything  is  linear  after  the  exchange  decision  has  been 
made  . 

The  polarization  encoding  is  the  key  to  the  compare-and-exchange  realization.  It 
not  only  reduces  the  energy  losses  in  combining  signals  but  also  reduces  the 
influences  of  crosstalk  and  feedback.  The  effect  of  the  unused  transmissions  and 
reflections  has  been  reduced  to  a  minimum  using  polarization  filtering.  Transmissions 
of  A  and  B  through  IF,  and  the  transmission  and  reflection  of  E  from  IF4  propagate 
back  toward  the  source  of  A  and  B.  The  reflection  of  R2  from  IF3  reflects  directly 
back.  Half  of  the  high  transmission  of  R,  goes  to  IF,.  But  in  this  case,  a  decision  has 
been  made,  and  IF,  is  no  longer  used  until  the  next  operation  begins.  The 
transmission  of  AB  through  IF3  can  propagate  to  IF2  which  can  only  be  high  when 
an  exchange  decision  is  made.  And  half  of  the  transmission  of  AB  from  IF2  can 
propagate  to  IF3.  But  this  can  be  eliminated  with  an  additional  Faraday  rotator  which 
also  prevents  R,  from  feeding  back  to  IF,. 

The  use  of  the  on-axis,  normal  incidence  makes  the  system  extendable  to 
operation  on  arrays,  so  that  two-dimensional  inputs  could  be  compared  and  exchanged 
at  the  same  time  in  parallel. 
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By  using  2-D  arrays  of  bistable  devices  and  folded  perfect-shuffle 
interconnections  [18],  optical  sorting  networks  may  be  feasible  for  large  numbers  of 
channels,  but  first,  system  engineering  issues  must  be  addressed  like  cascadability. 
uniformity,  crosstalk,  reliability  and  heat  dissipation. 

The  ZnS  interference  filters  have  relatively  slow  switching  times  (milliseconds) 
because  they  are  based  on  thermal  nonlinearities,  making  real-system  applications 
unlikely.  On  the  other  hand,  much  faster  comp  are- and -exchange  modules  based  on 
GaAs  Fabry-Perot  e talons  [19]  may  increase  the  throughput  of  the  sorting  networks. 
GaAs  embodiments  of  the  compare- and-exchange  designs  demonstrated  here  appear 
ideal  for  packet-switching  telecommunication  networks  because  GaAs  etalons  are 
diode-laser  compatible  [20]  and  allow  rapid  reconfiguration  of  very  high-speed  data 
channels. 

V.  Conclusions 

All-optical  compare  and  exchange  has  been  demonstrated  using  bistable  optical 
devices.  The  ZnS  interference  filters  used  required  a  speed  of  3  ms  per  bit  and  a 
total  four-filter  power  of  about  100  mW.  The  experimental  setup  is  extendable  to 
operation  on  arrays  and  to  other  bistable  optical  devices.  The  switching  times  might 
be  reduced  to  picoseconds  using  GaAs  etalons.  making  the  system  more  competitive 
with  alternative  approaches. 
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